Pairwise Statistical Significance Versus Database Statistical Significance for Local Alignment of Protein Sequences
نویسندگان
چکیده
An important aspect of pairwise sequence comparison is assessing the statistical significance of the alignment. Most of the currently popular alignment programs report the statistical significance of an alignment in context of a database search. This database statistical significance is dependent on the database, and hence, the same alignment of a pair of sequences may be assessed different statistical significance values in different databases. In this paper, we explore the use of pairwise statistical significance, which is independent of any database, and can be useful in cases where we only have a pair of sequences and we want to comment on the relatedness of the sequences, independent of any database. We compared different methods and determined that censored maximum likelihood fitting the score distribution right of the peak is the most accurate method for estimating pairwise statistical significance. We evaluated this method in an experiment with a subset of CATH2.3, which had been previoulsy used by other authors as a benchmark data set for protein comparison. Comparison of results with database statistical significance reported by popular programs like SSEARCH and PSI-BLAST indicate that the results of pairwise statistical significance are comparable, indeed sometimes significantly better than those of database statistical significance (with SSEARCH). However, PSI-BLAST performs best, presumably due to its use of query-specific substitution matrices.
منابع مشابه
Sequence-specific sequence comparison using pairwise statistical significance.
There has been a deluge of biological sequence data in the public domain, which makes sequence comparison one of the most fundamental computational problems in bioinformatics. The biologists routinely use pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is a well-known fact that almost everything in bioinformatics depends on t...
متن کاملEnhancing Parallelism of Pairwise Statistical Significance Estimation for Local Sequence Alignment
Pairwise statistical significance (PSS) has been found to be able to accurately identify related sequences (homology detection), which is a fundamental step in numerous applications relating to sequence analysis. Although more accurate than database statistical significance, it is both computationally intensive and data intensive to construct the empirical score distribution during the estimati...
متن کاملFPGA architecture for pairwise statistical significance estimation
Sequence comparison is one of the most fundamental computational problems in bioinformatics. Pairwise sequence alignment methods align two sequences using a substitution matrix consisting of pairwise scores of aligning different residues with each other (like BLOSUM62), and give an alignment score for the given sequence-pair. This work 1 addresses the problem of accurately estimating statistica...
متن کاملgpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences
Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...
متن کاملRapid Estimates of Statistical Significance of the Pairwise Nucleotide Sequence Alignment
Statistical significance of the similarity observed is the main question while comparing sequences. This problem has not yet been solved mathematically for optimal aligning of the sequences containing insertions and deletions. We have carried out the regression analysis of the observed similarity of random sequences depending on their length and nucleotide composition and are proposing a practi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008